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Abstract 


i 

This  paper  proposes  the  use  of  VLSI  technology  to  perform  relational  database  operations 
directly  in  hardware.  It  is  shown  that  relational  computations,  such  as  intersection, 
remove-duplicates,  union,  join,  and  division,  can  all  be  pipelined  elegantly  and  efficiently  on 
networks  of  processors  having  an  array  structure.  These  (systolic)  processor  arrays  are 
readily  and  cost-effectively  implementable  with  present  technology,  due  to  the  extreme 
simplicity  of  their  processors,  and  the  high  regularity  of  their  interconnection  structures. 
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1.  Introduction 

LSI  technology  allows  tens  of  thousands  of  devices  to  fit  on  a  single  chip;  VLSI  technology 
promises  an  increase  of  this  number  by  at  least  one  or  two  orders  of  magnitude  in  the  next 
decade.  This  paper  proposes  one  method  of  exploiting  this  technology  advance:  the 
construction  of  special-purpose  VLSI  chips  for  relational  database  operations.  These 
special-purpose  chips  are  to  be  attached  to  a  conventional  host  computer,  or  used  as  ar 
component  in  a  larger  special-purpose  system,  such  as  a  database  machine.  (We  suggest  one 
such  database  machine  at  the  end  of  this  paper.) 

In  [5]  a  structure  called  a  systolic  array  *  is  proposed  for  implementation  in  VLSI.  These 
arrays  of  processors  have  the  following  desirable  properties: 

1.  They  can  be  designed  and  implemented  with  only  a  few  different  types  of  simple 
cells. 

2.  The  array's  data  and  control  flow  is  simple  and  regular,  so  that  cells  can  be 
connected  by  a  network  with  local  and  regular  interconnections.  Long  distance 
or  irregular  communication  is  not  needed. 

3.  The  array  uses  extensive  pipelining  and  multiprocessing.  Typically,  several  data 
streams  move  at  constant  velocity,  over  fixed  paths  in  the  network,  interacting 
where  they  meet.  In  this  fashion,  a  large  proportion  of  the  processors  in  the 
array  can  be  kept  active,  so  that  the  array  can  sustain  a  high  rate  of  data  flow. 

VLSI  designs  based  on  systolic  arrays  tend  to  be  simple  (a  consequence  of  property  I), 
modular  (property  2)  and  of  high  performance  (property  3)  —  for  more  discussion  of  the 
attractiveness  of  the  systolic  array  approach,  see  [3].  In  the  present  paper  we  illustrate  the 
use  of  systolic  arrays  in  performing  relational  database  operations. 

In  section  2  we  give  details  concerning  the  notion  of  systolic  arrays,  and  present  some 
concepts  and  notation  for  discussing  relational  database  operations.  In  section  3,  we  describe 
the  basic  building  block  of  several  of  our  systolic  arrays:  a  systolic  processor  array  to 
compare  two  tuples.  Section  4  includes  a  detailed  systolic  example:  an  array  to  rapidly 
perform  the  intersection  (or  difference)  operation  on  two  relations.  In  section  5  we  use  an 
array  identical  to  the  intersection/difference  array,  to  remove  duplicates  from  acoltection  of 

^The  word  "eystolo"  wae  borrowed  from  physiologists,  who  u»e  if  lo  refer  fo  the  rhythmically  recurrent 
contractions  of  the  heart,  which  pulse  blood  through  the  body.  For  a  systolic  array,  the  action  of  a  celt  or  processor  is 
analogous  to  that  of  the  heart  Each  cell  regularly  pumps  data  in  and  out  (performing  some  short  computation  before 
#»ch  "contraction"),  to  that  a  regular  flow  of  data  is  Kept  up  in  the  network  Many  systolic  arrays  have  been  designed 
recently,  end  sre  surveyed  in  [7) 
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tuples,  and  to  perform  the  operations  of  union  and  projection  on  relations.  In  sections  6  and 
7  we  detail  relational  operations  (join  and  division)  that  are  substantially  different  from  the 
intersection-like  operations,  but  still  lend  themselves  to  simple  implementation  with  systolic 
arrays.  Section  S  remarks  on  some  implementation  and  performance  aspects  of  the  systolic 
arrays  proposed  in  this  paper.  Section  9  discusses  the  architectural  issues  of  an  integrated 
system  capable  of  using  many  types  of  systolic  arrays. 
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2.  Systolic  Arrays  and  Relational  Database  Considerations 

2.1  Systolic  Arrays 

Regular  geometric  structures  are  typically  used  in  systolic  arrays.  For  the  present  paper 
we  use  predominantly  orthogonally  and  linearly  connected  arrays  of  processors  (both  of 
which  are  shown  in  figure  2-1),  although  hexagonally  connected  arrays  as  in  [5]  would  work 
as  well  in  many  instances. 

±L_ 

f 

TT 

(a) 

Figure  2-1:  Orthogonally  and  linearly  connected  processor  arrays. 

We  find  that  these  arrays  facilitate  many  relational  database  operations  by  allowing  swift 
interaction  among  the  tuples  of  two  relations,  with  a  set  of  temporary  results  also  traveling 
through  the  array.  Typically,  the  relations  move  top-to-bottom  and  boltom-to-top,  and  the 
temporary  results  move  left-to-right.  All  of  the  data  in  the  array  moves  synchronously.  As  a 
piece  of  data  passes  through  a  processor,  it  may  have  some  computation  performed  on  it; 
then  it  is  passed  on  to  the  next  processor.  The  final  results  of  the  array  are  sent  out  a  side 
of  the  array. 

2.2  Processors 

In  figure  2-2  we  show  the  prototype  for  the  processor  used  in  the  orthogonally  or  linearly 
connected  systolic  structure.  The  processor  has  three  input  lines  and  three  output  lines.  For 
each  "pulse"  of  the  systolic  array,  inputs  come  in  on  the  input  lines,  and  outputs  leave  the 
processor  on  ttie  output  lines.  In  the  intervening  time,  all  of  the  work  (computation)  of  the 
processor  is  performed  —  the  processor  computes  some  simple  transformation  on  the  data 
which  it  has  just  received,  in  preparation  for  shipping  it  out  at  the  next  pulse.  VLSI  arrays 
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(a)  (b) 

Figure  2-2:  Orthogonal  and  linear  processor  prototypes. 

are  greatly  simplified  if  most  processors  in  the  array  are  identical.  This  is  the  case  for  the 
arrays  presented  in  this  paper.  Given  the  orthogonally  or  linearly  connected  array  structure, 
and  the  processor  prototype  described  here,  it  is  the  algorithm  actually  executed  by  each 
processor  that  determines  the  function  of  the  array.  Therefore,  to  define  a  systolic  array  to 
perform  a  specific  relational  operation,  we  specify  the  algorithm  for  the  processors  in  a 
systolic  array.  The  sections  below  consist  of  such  specifications  and  an  explanation  of  how 
they  actually  produce  the  desired  result. 


2.3  Representation  of  Relations 

In  the  following  discussion,  we  assume  some  familiarity  with  the  basics  of  relational 
database  theory  (zee,  for  example,  [1,  2]).  A  relation  is  a  set  of  tuples.  Each  tuple  consists 
of  an  ordered  sequence  of  elements.  It  is  these  elements  that  are  fed  through  our  systolic 
arrays.  The  tuples  in  a  relation,  however,  are  not  necessarily  ordered  in  any  particular 
fashion. 

In  a  relation,  an  element  can  be  of  any  data  type:,  an  integer,  a  boolean  value,  a  string,  etc. 
We  wish  to  give  all  of  these  a  uniform  representation,  in  order  to  simplify  the  design  of 
systolic  arrays  to  process  relations.  The  assumption  we  make  is  a  common  one  in  the 
implementation  of  relational  database  systems.  We  assume  that  the  elements  from  any 
particular  column  in  a  relation  are  selected  only  from  one  underlying  domain.  Each  member  of 
the  domain  is  uniquely  and  reversably  encoded  into  an  integer.  These  integer  encodings  are 
the  form  in  which  the  elements  are  stored  in  the  relations,  and  the  list  of  encodings  is  stored 
separately.  Whenever  necessary,  the  integers  are  decoded  into  the  appropriate  values 
however,  encoding  and  decoding  are  usually  only  necessary  for  input  or  output:  that  is,  for 
use  by  humans.  Most  relational  operations  are  logically  the  same  whether  they  operate  on 
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integers  or,  r..i y,  strings  or  calendar  (bites.  Since  --  for  our  purposes  --  integer  operations 
are  more  convenient,  we  assume  that  relations  are  stored  as  tuples  of  integers  (and  v/e  are 
not  concerned  with  encoding  and  decoding). 


2.4  Union-Compatibility 

Certain  relational  operations  such  as  union  and  intersection  can  only  be  performed 
between  relations  that  are  union-compatible.  Two  relations  arc  said  to  be  union-compatible 
if  the  following  two  conditions  hold: 

-  ,  -  They  have  the  same  number  of  columns  (and  thus  tuples  from  the  two  relations 

have  the  same  number  of  entries). 

-  Corresponding  columns  from  the  two  relations  have  entries  drawn  from  the  same 
underlying  domain. 

This  definition  is  an  attempt  to  capture  the  informal  notion  that  a  tuple  from  one  relation 
could  legally  be  a  member  of  the  other  relation,  in  that  the  respective  columns  of  the  two 
relations  are  defined  on  \he  same  domains. 


2.5  Multi-relations 

A  multi-relation  is  an  extension  of  the  concept  of  a  relation  in  which  duplicate  tuples  are 
allowed.  (This  is  by  analogy  with  the  term  "multi-set,"  since  a  relation  can  be  viewed  as  a 
set  of  tuples.)  This  is  a  notion  that  we  will  find  useful  later  in  the  paper.  Multi-relations  are 
usually  generated  as  the  intermediate  results  of  relational  operations.  For  example,  suppose 
we  remove  a  few  columns  from  a  relation  (which  is  the  projection  operation).  The 
intermediate  construct  we  obtain  before  we  remove  duplicate  tuples  to  produce  the  new 
(result)  relation  is  a  multi-relation. 


2.6  Notation 

We  briefly  summarize  the  notation  used  in  the  remainder  of'  the  paper.  Relations  and 
multi-relations  are  denoted  by  capital  letters:  A,  B,  C.  Tuples  that  are  members  of  these  are 
denoted  by  subscripted  lower-case  letters.  The  dh  tuple  of  A  is  denoted  by  ait  or  by  a^A,  if 
we  wish  to  indicate  membership.  In  turn,  elements  in  tuples  are  double-subscripted:  is 

the  fcth  element  of  a;,  and  the  whole  tuple  can  be  exhibited  as  a:  -  <a,-  .,< 5,...,a;  >.  The 

*  *  *|i  '  *ffl 

letter  n  is  usually  used  to  denote  the  number  of  tuples  in  a  relation  (the  cardinality  of  the 
relation,  since  a  relation  is  a  set):  |A|  -  n.  The  tetter  m  usually  designates  the  number  of 
elements  in  a  tuple  in  the  relation  in  question. 
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Letter  T  represents  a  boolean  matrix  that  contains  results  of  logical  operations.  The 
(i,y)-th  entry  of  T,  t  — ,  is  usually  used  to  denote  the  result  of  a  comparison  between  the  tth 
tuple  of  a  relation  and  the  yth  tuple  of  another.  Where  we  wish  to  display  the  formation  of 
tjy  over  time,  we  use  the  notation  t^  for  the  result  after  the  kth  time  steps  t‘,y‘t‘Q^  and  t 
denote  specific  instances  (the  first  and  the  last)  of  t^  (When  no  confusion  will  thereby 
result,  we  use  the  same  notation  t to  refer  to  for  any  k.)  Finally,  the  notation  t^  is  used 

to  designate  the  result  of  some  logical  operation  on  ail  of  the  members  of  the  «th  row  of  T, 
for  example,  the  OR  or  AND  of  t^,  for  all  k. 
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3.  Arrays  for  Tupb  Comparison 

In  several  of  the  relational  operations  described  below,  it  is  necessary  to  test  for  equality 
between  a  pair  of  tuples,  one  from  each  of  tv,o  relations.  (Two  tuples,  at<A  and  i>^(9,  whore 
A  and  B  are  union-compatible  relations  or  multi-relations,  are  said  to  be  equal  if  and  only  if 
element  al/(  equals  element  for  1  <  k  <  m.)  For  example,  in  the  intersection  operation, 
the  intersection  of  two  relations,  say  A  and  B,  consists  of  those  tuples  which  are  in  both  A 
and  B.  Forming  this  intersection,  then,  requires  many  tests  for  equality  between  tuples,  at<A 
and  6^<0.  In  this  section,  we  first  describe  a  linear  systolic  array  of  processors  capable  of 
performing  one  such  comparison.  We  then  combine  many  copies  of  this  basic  structure  to 
form  a  two-dimensional  systolic  array  that  can  pipeline  many  tuple  comparisons. 


3.1  Linear  Comparison  Array  for  Performing  One  Tuple  Comparison 


TRUE 


3  i,2 

J  i 


f  t 

b:„ 


Figure  3-1:  Tuple  comparison  array. 


A  tuple  comparison  can  be  done  by  the  linear  array  of  processors  in  Figure  3-1.  A  single 
processor  from  the  array  is  shown  in  more  detail  in  Figure  3-2.  One  can  see  that  the 
processor  array  in  Figure  3-1  is  able  to  compute  the  AND  of  the  comparison  results  from  all 
of  the  individual  element  comparisons.  More  precisely,  al  each  step  the  Jcth  processor  (from 
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f  *-  f  a  (a  -  b  ) 
OUT  IN  IN  IN 


a  *-  a 
OUT  IN 


b  *-  b 
OUT  IN 


3  OUT  b  IN 


Figure  3*2:  individual  comparison  processor. 


the  left)  in  the  array  compares  the  two  elements  aL ^  and  b^,  and  outputs  on  its  output  line 
*OUT  *be  ^0  of  this  comparison  result  with  the  input  to  the  processor  on  input  line 
(which  is  the  output  of  the  (fe-l)st  processor).  Thus,  if  the  input  to  the  left-most  processor  is 
the  value  TRUE,  then,  by  induction,  after  m  time  steps  the  output  at  the  right-most  processor 
of  the  processor  array  will  be  a  bit  indicating  whether  the  two  tuples  are  equal.  That  is,  this 
output  will  be  TRUE  if  and  only  if  all  of  the  comparisons  of  individual  elements  produced 
TRUE.  (Notice  also  that  if  the  initial  input  is  FALSE,  then  the  output  at  the  right  side  of  the 
array  is  guaranteed  to  be  false.  Surprisingly,  this  fact  will  be  useful  In  later  sections  of  the 
paper.) 

To  make  this  all  work,  all  of  the  data  must  be  in  the  right  place  at  the  right  time.  This  is 
why  the  inputs  to  the  individual  processors  are  "staggered”  (as  shown  by  the  "slanted"  input 
tuples  in  figure  3-1)  so  that  elements  a ^  and  b ^  arrive  at  the  Icth  processor  and  are 
compared  at  the  fcth  time  step.  Also  at  that  time  the  AND  of  the  results  of  previous 
comparisons  arrives  at  the  same  processor,  so  that  it  can  be  ANDe d  with  the  new  comparison 
result  at  the  processor. 

We  summarize  the  function  of  the  linear  comparison  array  shown  in  figure  3-1.  This  array 
compares  two  tuples  (presumably  one  from  each  of  two  relations),  and  forms  the  result  of  the 
comparison  by  propagating  intermediate  versions  of  that  result  to  the  right  through  the 
array.  By  staggering  entries  from  the  tuples  one  can  assure  that  the  output  from  the 
right-most  processor  of  the  array  will  be  the  result  of  the  equality  test  on  the  two  tuples. 


3.2  T.vo-Dii.'.onsional  Comparison  Array  for  Pipelining  Many  Tuple  Cor, '.pa, -irons 


Figu re  3-3:  Two-dimensional  (orthogonal)  comparison  array. 

We  concatenate,  vertically,  several  of  the  linear  comparison  arrays  described  above,  to 
form  a  2-dimensional  processor  array,  as  shown  in  Figure  3-3.  This  orthogonally  connected, 
2-dimensional  processor  array  can  perform  many  tuple  comparisons  in  parallel.  To 

accomplish  this,  we  feed  the  relations  A  and  B  into  the  array,  from  the  top  and  bottom, 

respectively. 

-  We  feed  the  relations  at  times  such  that  the  elements  of  any  given  tuple,  say  a^, 

are  "staggered,"  $0  that  the  element  enters  the  array  one  time  step  before 

the  element  <*;£♦/•  This  has  the  effect  of  staggering  the  inputs  to  each  of  the 

component  linear  arrays,  so  that  it  will  perform  exactly  as  the  single  linear  array 
described  above. 

-  We  pipeline  tuples  in  each  relation  through  the  orthogonal  processor  array,  in 
such  a  way  that  each  tuple  is  two  steps  behind  the  tuple  that  proceeded  it  into 
the  array.  This  assures  that  any  particular  pair  of  tuples  a^(A  and  6;<B  will 
eventually  cross  each  other.  More  specifically,  first  will  meet  bjJ  in  the 
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left-most  processor  of  some  row  in  the  processor  array.  These  two  elements 
will  be  compared,  and  the  result  of  this  comparison  will  be  ANDed  with  the  initial 
input  to  that  row  of  processors  (TRUE  for  our  present  purposes).  At  the  next 
time  step,  as  the  tuples  ripple  through  the  array,  element  will  meet  b  in 
the  processor  to  the  right,  in  the  same  row.  They  will  be  compared  there,  and 
the  result  of  the  comparison  will  be  ANDe d  with  the  output  from  the  first 
processor  to  produce  the  output  of  the  second  processor.  Processing  continues 
in  this  fashion,  and  the  intermediate  boolean  result  of  the  ANDs  propagates  to 
the  right  through  that  particular  row  of  processors,  until  —  as  discussed  above 
—  the  right-most  processor  outputs  a  boolean  value  that  indicates  whether  tuple 
a;  equals  tuple  b j. 

In  Figure  3-4,  the  t represent  intermediate  values  for  the  results  of  comparing  tuples 
with  tuples  bj.  (Note  that  in  the  figure,  the  initial  value  for  (3^3  is  just  about  to  enter  the 
processor  array.) 


3.3  Matrix  Notation 

For  convenience  in  discussion,  we  express  the  results  produced  by  a  comparison  array  in 
the  form  of  a  matrix  T.  The  elements  of  the  matrix  are  defined  as  follows: 

|  TRUE  if  t l^itial-TRUE,  and  for  all  Isksm, 

lJ  |  FALSE  otherwise. 

We  see  that  it  is  these  t ^  that  are  produced  at  the  right-most  column  of  the  array  described 
in  Section  3.2. 

In  the  following  sections,  we  add  additional  processors  which  manipulate  these  t^-’s  after 
they  leave  the  comparison  array.  These  manipulations  will  be  shown  to  produce  the 
equivalent  of  refationai  operations. 
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4.  Arrays  for  Intersection  —  A  Detailed  Example 

In  the  preceding  section,  we  saw  how  we  could  use  a  systolic  comparison  array  to  quickly 
do  pairwise  comparisons  on  sets  of  tuples.  The  results  of  these  comparisons  (t^)  are  sent 
out  from  the  right  side  of  the  array.  By  examining  a  particular  relational  operation,  namely 
intersection,  in  some  detail,  we  illustrate  how  these  individual  results  are  combined  in 
applications. 

4.1  The  Intersection  Operation 

Consider  the  operation  of  finding  the  intersection  of  two  union-compatible  relations 
C  -  A  n  a 

The  relation  C  consists  of  those  tuples  that  are  in  both  relation  A  and  relation  B.  This  is 
exactly  the  same  as  finding  those  tuples  in  A  which  are  also  in  B.  Thus  we  need  only  examine 
the  tuples  in  A  for  membership  in  B.  This  is  the  basis  for  our  "intersection  array."  We 
compare  each  tuple  a^(A  pairwise  with  each  tuple  b^B.  For  each  if  matches  tome  bj, 
then  is  a  member  of  the  intersection.  This  is  where  the  comparison  array  described  in  the 
preceeding  section  comes  in  handy. 


4.2  The  Intersection  Array 

The  intersection  array  for  performing  the  intersection  operation  consists  of  a 
(two-dimensional)  comparison  array  on  the  left  and  a  (linear)  accumulation  array  on  the  right 
(see  figure  4-1).  The  comparison  array  performs  comparisons  between  tuples  in  A  and  tuples 
in  B,  to  produce  the  matrix  T,  whereas  the  accumulation  array  accumulates  t ^  to  forms 

"  ^lSjSn  /  (4.1) 

One  can  easily  see  that  a  tuple  a^<A  is  a  member  of  the  intersection,  i.e.  matches  some 
6y<B,  if  and  only  if  t^  is  true. 

Figure  4-1  illustrates  how  the  intersection  array  computes  the  intersection  of  two  3x3 
relations.  Processors  in  the  accumulation  array  are  called  accumulation  processors!  their 
function  is  as  follows.  At  each  time  step,  an  accumulation  processor  takes  its  left  input  (some 
t ij  from  the  comparison  array),  OR' s  that  with  the  top  input  (some  tj),  and  passes  on  the 
result  as  its  output  (the  updated  fj)  to  the  processor  below.  More  specifically,  a  t^  is  formed 
in  the  accumulation  array  in  the  following  manner.  First  t^  j  reaches  an  accumulation 
processor  frcm  the  comparison  array  on  the  left.  At  the  next  time  step,  this  value  is  sent  to 
the  accumulation  processor  below.  Ouring  the  same  time  step,  t^g  »s  sent  into  that 
accumulation  processor  from  the  left,  and  is  ORe d  with  t^j.  Similarly,  at  the  next  time  step, 
the  result  of  this  OR  is  sent  down  one  processor,  and  is  ORed  with  t^g,  which  is  just  arriving 
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Comparison  Accumulation 

Array  Array 


RESULTS 

Figure  4-1:  Intersection  array,  consisting  of  two  modules: 

(2-dim)  comparison  array  on  the  left,  and  (1-dim)  accumulation  array  on  the  right. 

from  the  left.  In  an  implementation,  the  first  accumulation  processor  can  be  identical  in 
function  to  the  others,  provided  we  initialize  the  value  moving  down  through  the  accumulation 
array  as  FALSE  (t.e.,  .  FALSE ;  in  the  figure,  tg  is  about  to  enter  the  array  with  its 

initial  value).  This  value  is  successively  ORe d  with  all  of  the  t^,  for  all  fc,  and  when  it  leaves 
the  bottom  of  the  accumulation  array,  it  takes  on  the  value  tj,  specified  in  equation  (4.1).  This 
t^  designates  whether  is  a  member  of  the  intersection  C,  and  it  is  then  a  simple  matter  to 
use  the  t^’s  to  generate  C  from  A. 

At  any  time  step,  accumulation  processors  that  aren't  busy  (i.e.  that  have  no  t^y  coming  in 
from  the  left)  simply  pass  on  the  t^  that  they  have.  It  takes  less  than  the  length  of  the 
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accumulation  array  to  produce  a  tt,  but  different  t4  are  produced  in  different  sub-arrays. 

4.3  Remark 

We  have  illustrated  the  use  of  the  so-called  accumulation  array  at  the  right  of  the 
comparison  array  to  implement  a  desired  relational  operation,  namely,  the  intersection 
operation.  In  general,  as  shown  in  the  rest  of  the  paper,  only  simple  changes  in  the 
accumulation  array  or  in  the  input  data  are  required  to  alter  the  output  of  the  array  to 
produce  other  useful  functions.  The  main  "hardware"  —  the  comparison  array  —  is 
sufficiently  general  that  it  need  not  be  changed  at  all. 

As  an  illustration,  we  see  that  after  a  slight  modification  the  intersection  array  can  be  used 
to  perform  the  difference  operation  on  two  relations.  The  difference,  C,  of  two 
union-compatible  relations  A  and  B,  denoted  C  »  A  -  B,  consists  of  those  tuples  that  are 
members  of  A,  but  are  not  members  of  B.  When,  we  compute  the  intersection  with  the 
intersection  array,  we  notice  that  t4  is  TRUE  for  any  tuple  that  is  in  both  A  and  B  (*.«., 
A  n  B).  We  can  also  see  that  t4  is  FALSE  for  any  a4  that  was  in  A,  but  not  in  B,  which  is 
precisely  the  condition  for  being  in  the  difference.  Therefore,  to  form  A  -  B,  we  can  use 
the  intersection  array,  with  the  modification  that  the  tuples  in  th»  resulting  relation 
correspond  to  those  t^’s  which  are  FALSE,  instead  of  TRUE.  (Alternatively,  we  could  just  put 
an  inverter  on  the  output  line  of  the  accumulation  array.) 
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5.  Arrays  for  Removal  of  Duplicate  Tuples 


The  operation  remove-duplicates  transforms  a  multi-relation  (defined  in  section  2.5),  A,  into 
a  relation,  A’,  which  contains  all  of  the  tuples  in  A,  except  that  no  tuple  is  duplicated  in  A*. 
The  systolic  array  used  for  intersection  in  the  last  section  can  also  be  used  for  the  operation 
remove-duplicates.  Instead  of  comparing  relation  A  to  relation  B,  we  compare  relation  A  to 
itself,  by  feeding  it  into  both  the  top  and  bottom  of  the  array.  (Mote  that  A  is 
union-compatible  with  itself.)  By  doing  so,  we  produce  a  matrix,  T,  whose  elements  are: 

|  TRUE  if  \^j‘tial=TRUE,  and  alk=ajk  for  all  1  <k<m, 

Kij  "  I 

j  FALSE  otherwise. 

Our  strategy  for  eliminating  duplicate  tuples  from  A  is  to  remove  all  tuples  that  are 
preceeded  by  another  tuple  that  equals  it.  For  example  if  tuples  a$,  ajp,  and  a?j  are  all 
equal,  then  in  producing  A’,  we  wish  to  remove  a. jp  and  from  A,  leaving  ag  in  A’  (not 
necessarily  as  a£  because,  for  example,  aj  might  equal  a^).  In  our  matrix  notation,  the 
problem  is  then  that  of  removing  any  tuple  a^,  where  there  exists  a  t lj=TRUE,  for  j<L  This  is 
equivalent  to  saying  that  we  wish  to  remove  any  tuple  corresponding  to  a  row  in  the  matrix  T 
which  contains  a  “TRUE"  in  the  lower  triangle  (left  of  the  main  diagonal).  We  could  find  the 
appropriate  by  ORing  across  each  row  of  T,  as  far  as  (but  not  including)  the  main  diagonal. 
Alternatively,  we  could  set  the  main  diagonal  and  the  upper  triangle  all  to  FALSE,  and  then 
take  the  OR  across  the  whole  row.  This  second  scheme  is  what  we  will  do. 

For  those  t ^  ■  on  the  main  diagonal  and  in  the  upper  triangle  (<<»,  we  set  to  F ALSE. 

This  implies  that  t  —  will  be  FALSE  for  i<J,  since  the  comparison  array  works  by  A/VOing  each 
individual  comparison  result  with  the  current  value  of  t^.  The  accumulation  processors  in  the 
remove-duplicates  array  act  identically  to  those  in  the  intersection  array.  They  form  the  OR 
of  each  row  of  the  matrix  T.  To  produce  A’,  we  eliminate  from  A  any  row  where  the  resulting 
t^  is  TRUE,  and  keep  the  rest.  (This  is  the  opposite  of  the  intersection  operation,  where  we 
keep  those  rows  with  TRUE  t^). 

Our  remove-duplicates  array  can  be  used  to  implement  the  following  relational  operations: 

Union 

The  union  C  ■  A  u  B  of  two  union-compatible  relations,  A  and  B,  is  the  relation  containing 
all  tuples  in  either  A  or  B,  without  duplicates.  It  is  straightlorward  to  form  A  U  B  by  applying 
the  remove-duplicates  operation  to  the  concatenation  A+B  of  A  and  B: 

C  ”  remove-dupUcateslA  *  B). 
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In  practice,  this  means  that  we  first  form  the  concatenation  of  A  and  B  as  we  retrieve  them. 
We  then  put  the  concatenation  through  both  sides  of  the  remove-duplicates  array,  and  what 
comes  out  is  a  bit-string,  indicating  which  tuples  of  the  concatenation  should  be  in  the  union. 

Projection 

The  projection  operation  is  similarly  easy,  with  our  remove-duplicates  operation.  We  speak 
Of  the  projection  of  a  relation  A  over  a  column,  or  list  of  columns,  f.  (Usually,  /  is  of  the  form 
"first  column,  second  column,  fifth  column,"  or  "name  column,  salary  column,  children  column.") 
The  projection  is  produced  by  first  finding  for  each  tuple  ot'A,  the  corresponding  (smaller) 
tuple  which  contains  only  those  columns  from  at  that  have  been  specified  in  /  -  this  can 
be  done  conveniently  during  the  time  when  the  original  tuples  are  retrieved  from  storage. 
The  set  Ay —  a  multi-relation  in  general  —  of  the  resulting  smaller  tuples  is  then  transformed 
into  a  relation  by  removing  duplicate  tuples.  This  is  precisely  the  function  performed  by  our 
remove-duplicates  array.  (Duplicates  may  occur  in  Ay  since  we  are  taking  the  projection  of  a 
relation  which  may  contain  tuples  that  differ  only  in  columns  that  are  not  in  /.) 
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3.  Arrays  tor  Join 


6.1  The  Join  Operation 

Wg  illustrate  the  join  operation  by  describing  a  special  case:  tbe  join  over  a  single  column. 

The  more  general  case  is  sketched  later  in  this  section.  The  join,  C,  of  two  relations,  A  and  B, 

over  columns  C ^  and  Cg,  respectively,  is  written  C  =  A  Cg}  The  i°'n<  C, 's  set  of 

tuples,  eij,  such  that  |  b  ,  where  <i(  q  =  b  ^  ,  for  at(A  and  b  <B.  (For  the  join 

to  be  well-defined,  columns  C^'  and  Cg  must  be  drawn  from  the  same  underlying  domain.) 

The  operator  "I  “  is  defined  to  be  the  concatenation  of  its  two  arguments,  with  the 

r  .  tr.  rvt 

exception  that  only  one  of  and  6,r~  is  included  in  the  concatenation. c 

r  7  ‘■'-'A 

Intuitively,  we  check  all  pairs  of  tuples,  a ^  and  b j,  taken  from  relation  A  and  B, 
respectively.  Where  they  match  in  the  columns  specified  by  Ca  and  Cg,  we  concatenate  the 
two  tuples.  After  removing  one  of  the  two  matching  columns  (to  eliminate  redundancy),  we 
add  the  concatenation  to  the  join,  relation  C. 


6.2  The  Join  Array 

We  can  formulate  the  results  of  a  join  again  in  terms  of  a  matrix.  Let  the  matrix  T  be 
defined  as 


TRUE 


FALSE 


"  aiFA  '  6F.Cb 


otherwise. 


That  is,  t Lj  is  true  if  and  only  if  and  by  match  in  the  specified  columns. 

If  we  have  the  matrix  T,  it  is  straightforward  to  generate  the  relation  C.  For  each  t y  that 
has  the  value  TRUE  (and  for  only  those  t -),  we  simply  retrieve  and  b j,  and  concatenate 
them,  removing  the  redundant  column.  The  size  of  the  join,  |C|,  might  be  as  large  as  the 
product  |A||B|.  (This  happens  in  the  degenerate  case  where  all  tuples  in  A  match  all  tuples  in 
B  in  the  specified  columns.)  However,  for  most  applications  the  number  of  TRUE  t^y’s  in  T  is 
far  less  than  this  product.  Therefore,  we  can  usually  generate  C  fast,  provided  we  can 
produce  T  quickly.  A  fast  way  of  producing  T  is  the  concern  of  this  section. 

Consider  the  linear  array  of  processors  in  figure  6-1.  We  use  this  array  to  produce  the 


^Actually,  authors  differ  aa  to  whether  the  redundant  column  appears  in  the  join  For  example,  Date  [2]  includes  it, 
but  C odd's  orifinal  paper  ft)  omits  it. 
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•  Figure  6-1  s  Join  array. 

matrix  T.  The  column  CA  of  relation  A  (column  3  in  the  example  in  the  picture)  is  input  to  the 
processor  array  from  its  top,  and  moves  down.  Similarly,  the  column  Cg  of  B  (column  1  in  the 
example)  is  sent  through  the  array  from  bottom  lo  top.  As  the  two  columns  "pass  through" 
each  other,  each  *^0^  mee*  each  ^ ;,Cg-  ^We  send  *he  c0,umns  through  the  array  in  such 
a  way  that  each  element  follows  its  predecessor  after  fwo  time  steps  so  that  all  pairs  of 
®t,CA  and  6y,Cg  meet>  When  ®i,CA  meets  &ACg'  a  simple  comparison  suffices  to  determine 
the  value  of  t^y.  These  t^y  are  collected  at  the  right  of  the  array.  (In  the  figure,  the  are 
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shown  coming  out  from  the  nr  my.)  Unlike  some  of  the  operntions  discussed  earlier,  here  we 
are  interested  in  the  t  —  individually,  and  do  not  perform  further  accumulation  operations  on 
thorn. 


6.3  General  Case 


6.3.1  Join  Over  More  Than  One  Column 


In  the  general  case,  and  Cg  specify  more  than  one  column.  Their  specifications  are 
constrained  in  the  following  way: 

-  the  number  of  columns  specified  by  must  be  the  came  as  that  specified  by 
Cq,  and 


-  the  respective  columns  in  the  specifications  must  be  based  on  the  same 
underlying  domains  (up  to  a  permutation,  which  can  easily  be  handled). 


Given  this,  (=  ^  ^ bj )  <  C  only  if  =  ^jCq'  means  that  tuple  a ^  must  match 

tuple  b  in  all  of  the  columns  specified  by  Ca  and  Cg.  The  concatenation  operator  “|  " 

^  {Ca,Cp} 

is  defined  analogously:  the  concatenation  includes  only  one  copy  of  the  columns  over' which 

A  and  B  are  being  joined. 


The  corresponding  modification  lo  the  processor  array  in  figure  6-1  is  simple.  Instead  of 
having  one  column  of  processors  in  the  array,  we  have  several  columns:  one  for  each 
relational  column  over  which  A  and  B  are  to  be  joined.  Each  processor  column  is  responsible 
for  comparing  and  bj  in  some  particular  column  pair,  and  the  result  t y  is  propagated  to  the 
right,  in  essentially  the  same  way  as  in  the  intersection  array.  When  they  reach  the  right 
side  of  the  processor  array,  the  t^'s  are  used  directly,  without  an  intervening  accumulation 
array. 


6.3.2  Non-Equi-Join 

The  join  operation  we  have  been  considering  so  far  in  this  section  is  usually  referred  to  as 
the  equi-join,  since  the  join  is  performed  on  tuples  lor  which  the  values  in  columns  equal 
-  those  in  columns  Cg.  This  notion  can  be  generalized  to  allow  any  sort  of  binary  comparison 
(e.g.  S,  >,  etc.)  to  be  done  between  the  relevant  columns  of  the  two  tuples. 

The  processor  array  to  perform  such  an  operation  is  easy  to  construct.  For 
greater-than-join,  say,  processors  in  the  array  would  simply  perform  that  comparison 
between  and  Cg.  The  particular  operation  to  be  performed  might  be  encoded  in  a  few 
bits,  and  passed  along  with  the  a ^  ■  and  t^  Or,  it  might  be  preloaded  into  the  array  of 
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processors.  This  illustrates  that  some  degree  of  programability  can  often  be  provided  to  a 
processor  array  at  the  expense  of  additional  logic. 
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7.  Arrays  for  Division 


Division  is  an  operation  between  two  relations  (the  dividend  and  the  divisor)  which 
produces  another  relation  (the  quotient)  as  its  result.  The  notation  "C  **  A  *  {Cy^Cg} 
means  that  C  is  the  result  of  dividing  A  by  □  over  the  columns  of  A  and  Cg  of  B. 


We  show  how  to  perform  the  division  operation  by  a  processor  array  for  a  restricted  case 
of  division:  A  is  a  binary  relation  and  0  is  a  unary  relation,  f uriher,  and  Cg  specify  only 
single  columns.  The  extension  from  this  to  the  general  case  is  straightforward  (as  in  the 
preceding  section  on  the  join). 


Let  the  dividend  A  have  columns  A j  and  ^2  and  let  the  divisor  B  have  column  Bj,  and  let 
/?2  and  B j  be  defined  on  the  same  underlying  domain  (which  makes  their  elements 
comparable).  Then  the  divide  operation  C  =  A  “  ®  produces  a  quotient  C,  having 

column  Ci  defined  on  the  same  domain  as  A^;  a  value  x  will  appear  in  Ci  if  and  only  if  the 
pair  ( x,y )  appears  in  A  for  every  value  y  appearing  in  Bj  [2J.  An  example  of  the  division 
operation  is  shown  in  figure  7-1. 


Figure  7-1:  Example  of  relational  division 

Our  systolic  array  for  performing  relational  division  consists  of  two  modules:  a  dividend 
array  and  a  divisor  array.  Figure  7-2  illustrates  how  the  division  array  works  on  the 
example  given  in  figure  7-1.  The  left-hand  column  of  the  two  columns  of  processors  in  the 
dividend  array  stores  (distinct)  elements  appearing  in  column  Ap  one  element  to  a  processor. 
(These  elements  —  {i,  j,  k}  for  this  example  —  can  be  identified  by  the  remove-duplicates 
array.)  Similarly,  elements  appearing  in  the  divisor  8j  are  preloaded  into  each  row  of 
processors  in  the  divisor  array.  In  the  figure,  circled  elements  represent  those  elements 
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Figure  7-2:  Oivision  array  (in  operation), 
which  are  stored  at  processors. 

The  dividend  array  computes  for  each  element  r  appearing  in  the  set  of  y  such  that 
(*,yKA.  It  works  as  follows.  We  take  each  pair  (z,y HA,  and  pass  it  into  the  dividend  array 
from  the  bottom?  the  *  into  the  left  column  and  y  into  the  right  column.  At  each  time  step, 
the  z  will  be  in  the  same  processor  as  some  preloaded  element  x,  and  the  y  will  be  following 
one  step  behind' it,  in  the  column  to  the  right.  We  compare  z  to  *,  and  if  they  match,  we 


output  a  TRUE  from  the  right  side  of  1  tie  processor;  otherwise,  we  produce  a  FAI.SE.  This 
boolean  value  t  arrives  at  the  processor  in  the  right  column,  just  as  the  associated  y  arrives 
there.  If  t  is  true,  then  y  is  output  from  the  right  side  of  the  processor.  Otherwise,  some 
null  value  is  output. 

Thus  for  each  x  appearing  in  /tj,  the  non-null  values,  output  from  the  dividend  array  at  the 
row  whose  left  processor  has  x  stored,  are  those  y  s  such  that  (r,y) (A.  We  see  that  if  these 
y’s  include  all  the  elements  in  Sj,  then  x  belongs  to  Cj.  This  is  checked  by  the 
corresponding  row  of  processors,  in  the  divisor  array,  which  takes  the  y' s  as  inputs.  Wore 
precisely,  each  processor  of  the  row  checks  if  the  element  it  is  storing  matches  any  of  the 
y’s  passing  from  left  to  right  along  the  row.  If  every  processor  of  the  row  finds  at  least  one 
such  match  (which  is  checked  by  doing  an  AND  across  the  row  after  the  dividend  passes 
through  the  array),  then  the  y's  contain  a,  b,  c,  and  d,  and  thus  x  belongs  to  Ci.  This  is  the 
essential  idea  behind  the  division  array.  One  can  already  see  that  the  division  array  provides 
the  same  kind  of  rapid  computations  (using  simple  and  regular  structures)  as  other  arrays 
discussed  earlier. 
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8.  Remarks  on  Implementation  and  Performance 

During  the  past  year,  we  have  designed  prototypes  of  several  special-purpose  chips  at 
CMU.  These  include  a  pattern-match  chip  [3],  an  image-processing  chip  [6],  and  a  tree 
processor  for  database  applications  [9].  The  pattern-match  chip  can  be  viewed  as  a 
scaled-down  version  of  the  comparison  array  in  Section  3.  (This  chip  has  been  fabricated, 
tested,  and  found  to  work.)  The  following  comments  and  projections  are  based  partly  on  our 
experience  with  the  pattern-match  chip. 

In  some  of  the  schemes  presented  in  this  paper,  it  is  the  case  that  only  half  of  the 
processors  in  a  systolic  array  are  busy  at  any  one  time.  This  inefficiency  can  be  avoided  in 
the  following  implementation:  rather  than  marching  two  relations  against  each  other  along 
the  systolic  array,  we  let  only  one  relation  move  while  the  other  remains  fixed.  Also,  for 
simplicity,  we  have  so  far  assumed  that  processors  in  systolic  arrays  operate  on  words.  In 
implementation,  each  word  processor  can  be  partitioned  into  bit  processors  to  achieve 
modularity  at  the  bit-level.  A  transformation  of  a  design  from  word-level  to  bit-level  is 
demonstrated  in  [3J  In  general,  many  variations  on  the  systolic  arrays  suggested  are 
possible.  All  of  these  are  equivalent,  and  differ  only  in  implementation  details. 

Below,  we  give  figures  for  a  reasonable  array  size  for  implementation.  While  such  an 
array  would  be  large  enough  for  many  applications,  it  is  also  possible  to  use  the  array  to 
solve  problems  that  will  not  fit  entirely  on  if.  This  calls  for  the  technique  of  decomposing 
problems.  The  technique  is  best  illustrated  by  a  simple  example.  In  the  intersection  problem, 
consider  fhe  matrix,  T,  of  results.  For  a  large  problem,  one  can  simply  partition  this  matrix 
into  sub-problems  small  enough  to  fit  on  the  array;  each  of  these  sub-problems  would 
generate  a  piece  of  the  matrix. 

Intersection  is  one  of  the  most  computationally  demanding  relational  operations,  since  it 
requires  full  tuple  comparisons  between  a U  possible  pairs  of  tuples.  We  examine  the  speed 
with  which  systolic  arrays  can  perform  intersection. 

We  make  the  following  assumptions  concerning  the  size  of  a  typical  relation: 

-  A  tuple  is  of  size  1500  bits  (or  about  200  characters). 

-  A  relation  is  of  size  10^  tuples. 

The  following  (conservative)  estimates  are  typical  of  results  that  have  been  achieved  with 
present  NMOS  technology: 

-  A  bit-comparator,  the  fundamental  workhorse  unit  of  our  arrays,  is  about 
240ji  x  150p  in  area.  The  comparison  is  performed  (very  conservatively!)  in 
about  350n»,  including  time  for  on-chip  and  off-chip  data  transfer. 
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-  With  present  technology,  chips  are  abr  jt  6000>i  x  6000 ft  in  area.  Division  gives 
us  about  1000  bit -comparators  per  chip.  (Notice  that  this  calculation  is  realistic 
Only  if  t Ho  design  is  repetitively  regular,  which  is  the  case  for  our  systolic 
arrays.)  Wo  can  assume  that  none  of  the  comparators  on  a  chip  incurs  delay  due 
to  pin  limitations;  since  the  time  for  a  comparison  is  large  relative  to  off-chip 
transfer  time  (<3 Ons),  we  can  multiplex  about  10  bits  on  a  pin  during  a  single 
comparison. 

-  It  is  practical  to  construct  devices  involving  a  few  thousand  chips.  We  assume 
1000  chips.  This  gives  us  the  capability  of  performing  10^  comparisons  in 
parallel. 

Based  on  these  assumptions,  we  can  make  ihe  following  performance  predictions  for 
intersection.  The  intersection  requires  a  total  of  1.5  x  10*  *  bit  comparisons,  since  we  need 
1500  bit-comparisons  for  each  of  the  (10^)^  tuple  comparisons.  The  time  to  perform 
intersection,  therefore,  is: 

(1.5  x  10*  ^comparisons)  x  (350n*  /  10®comparisons), 
which  is  about  50ms.  We  believe  that  this  estimate  is  extremely  conservative,  even  with 
existing  technology.  If  we  assume  instead,  for  example,  200n*/comparison,  and  3000  chips, 
we  derive  a  figure  of  about  10ms. 

The  processing  speed  obtainable  from  these  systolic  arrays  can  keep  up  with  the  data  rate 
achievable  with  the  fast  mass  storage  devices  available  in  present  technology.  For  example, 
a  moving-head  disk  rotates  at  about  3600  r.p.m.,  or  about  once  every  17ms.  Assume  that  we 
can  read  an  entire  cylinder  in  one  revolution,  as  in  some  of  the  proposed  database  machines 
(for  a  survey  of  these  machines,  see  [4]).  This  is  a  rate  of  about  500,000  bytes  in  17m*.  In 
a  comparable  period  of  time,  our  systolic  array  can  process  (for  example,  can  intersect)  two 
relations,  each  of  about  2  million  bytes. 
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9.  Remarks  on  the  Organization  of  an  Integrated  Systolic  System 

Systolic  arrays  introduced  in  preceeding  sections  are  capable  of  rapid  processing  of 
individual  relational  database  operations.  To  process  all  of  the  operations  required  in  a  single 
transaction  or  a  set  of  transactions,  an  integrated  system  containing  several  systolic  arrays  is 
needed.  Many  strategies  arc  possible  for  the  interconnection  of  the  systolic  devices.  To 
decide  which  interconnection  strategy  to  choose,  one  must  consider  the  system  requirements: 

-  High  capacity  for  data  transfer.  As  described  in  the  last  section,  it  is  feasible 
that  a  systolic  array  may  process  hundreds  of  thousands  of  bytes  per 
millisecond. 

-  Flexibility  and  generality.  The  execution  order  of  systolic  devices  varies  greatly 
from  one  transaction  to  another  transaction.  Relations  may  have  to  be 
decomposed  to  fit  the  (fixed)  sizes  of  systolic  arrays.  Results  from  subrelations 
must  be  stored  outride  the  systolic  arrays  before  they  are  finally  combined. 

One  organization  that  seems  to  match  the  system  requirements  is  the  crossbar  switch 
interconnection  depicted  in  Figure  9-1.  Typically,  the  system  works  as  follows.  Initially,  the 
relevant  relations  are  read  from  disks  into  memories.  (Disks  with  "logic-per-track" 
capabilities  [8]  can  of  course  be  incorporated  into  the  system,  so  that  some  simple  queries 
never  have  to  be  processed  outside  the  disks.)  Then  the  crossbar  switch  is  configured  so 
that  the  relevant  memories  are  connected  to  the  systolic  array  that  will  perform  the  first 
operation  of  the  transaction  in  question.  The  data  is  pipelined  from  the  memories  through  the 
switch  and  through  the  processor  array.  The  output  of  the  array  is  pipelined  back  into 
another  memory.  This  is  repeated  for  each  relational  operation  in  the  transaction.  Due  to 
the  crossbar  structure,  several  operations  may  be  run  concurrently.  The  final  results  are 
eventually  returned  to  the  disk  (or  a  user’s  terminal,  or  printer,  etc.)  from  the  memory  in 
which  they  reside. 

In  the  future,  we  plan  to  perform  a  detailed  analysis  of  the  crossbar  scheme  and  a 
comparison  of  this  scheme  with  other  alternative  structures.  For  example,  Song  [9]  has 
suggested  the  use  of  a  tree  machine  for  database  applications.  The  leaf  nodes  of  the  tree 
machine  are  responsible  for  data  storage,  and  for  a  limited  amount  of  processing  of  the  data. 
The  tree  structure  itself  is  used  to  broadcast  instructions  and  data,  and  to  combine  results  of 
low-level  computations  on  the  data.  This  same  tree  machine  is  capable  of  performing  all 
database  operations.  A  detailed  comparison  of  these  and  other  database  machine  structures 
is  needed  in  order  to  understand  their  relative  merits. 
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